Analysing Gene Expression of Breast Cancer patients

Iben Sommerand s203522
Jonas Sennek s203516
Emilie Wenner s193602
Torbjørn Bak Regueira s203555
Vedis Arntzen s203546

Introduction

  • 2 296 840 new breast cancer patients in 20221.

  • Aim of project: Exploring and analyzing patterns in breast cancer data, using gene expression and different phenotypic traits.

Materials and Methods

  • The analysis was performed on the dataset “GDC TCGA Breast Cancer (BRCA)” from xenabrowser.net

  • Our data:

    • Gene expression (RNAseq) and phenotype metadata
  • Analytical methods:

    • Descriptive data analysis, PCA and Linear Modelling

Materials and Methods

Figure 1: Flowchart presenting an overview of the process from raw data to augmented data

Descriptive analysis: Overview of the data

Alive
(N=2050000)
Dead
(N=400000)
Overall
(N=2450000)
gender
female 2026000 (98.8%) 398000 (99.5%) 2424000 (98.9%)
male 24000 (1.2%) 2000 (0.5%) 26000 (1.1%)
ethnicity
hispanic or latino 76000 (3.7%) 2000 (0.5%) 78000 (3.2%)
not hispanic or latino 1596000 (77.9%) 376000 (94.0%) 1972000 (80.5%)
not reported 378000 (18.4%) 22000 (5.5%) 400000 (16.3%)
age_group
(20,30] 16000 (0.8%) 2000 (0.5%) 18000 (0.7%)
(30,40] 116000 (5.7%) 34000 (8.5%) 150000 (6.1%)
(40,50] 424000 (20.7%) 72000 (18.0%) 496000 (20.2%)
(50,60] 550000 (26.8%) 72000 (18.0%) 622000 (25.4%)
(60,70] 534000 (26.0%) 94000 (23.5%) 628000 (25.6%)
(70,80] 270000 (13.2%) 74000 (18.5%) 344000 (14.0%)
(80,90] 108000 (5.3%) 52000 (13.0%) 160000 (6.5%)
Missing 32000 (1.6%) 0 (0%) 32000 (1.3%)

Figure 2: Gender and ethnicity distribution within the data

Figure 3: Cancer stage distribution within the data

Descriptive analysis: Vitality

Figure 4: Vitality based on cancer type

Figure 5: Vitality by age

Descriptive analysis: Vitality

Figure 8: Vitality based on cancer type

Figure 9: Vitality by age

Analysis: Investigating cancer stages

Figure 10: Survival time by Cancer Stage

Figure 11: Vital Status by Cancer Stage

Analysis: Linear modelling

Figure 6: Age at Diagnosis vs Years to Death

Figure 7: Predicted vs Actual Years to Death by Treatment

Analysis: PCA

Figure 12: Principal Component Analysis

Figure 13: Scree plot

Discussion:

  • Catching the cancer in an early stage seems to increase chance of survival

  • Limitations and future work

    • Compare against healthy tissue samples (eg. GTEX)